Backfitting for large scale crossed random effects regressions
نویسندگان
چکیده
Regression models with crossed random effect errors can be very expensive to compute. The cost of both generalized least squares and Gibbs sampling easily grow as N3/2 (or worse) for N observations. Papaspiliopoulos, Roberts Zanella (Biometrika 107 (2020) 25–40) present a collapsed sampler that costs O(N), but under an extremely stringent model. We propose backfitting algorithm compute estimate prove it O(N). A critical part the proof is in ensuring number iterations required O(1), which follows from keeping certain matrix norm below 1?? some ?>0. Our conditions are greatly relaxed compared those sampler, though still strict. Empirically, has less strict than our assumptions. illustrate new on ratings data set Stitch Fix.
منابع مشابه
Laws of Large Numbers for Random Linear
The computational solution of large scale linear programming problems contains various difficulties. One of the difficulties is to ensure numerical stability. There is another difficulty of a different nature, namely the original data, contains errors as well. In this paper, we show that the effect of the random errors in the original data has a diminishing tendency for the optimal value as the...
متن کاملEfficient moment calculations for variance components in large unbalanced crossed random effects models
Large crossed data sets, described by generalized linear mixed models, have become increasingly common and provide challenges for statistical analysis. At very large sizes it becomes desirable to have the computational costs of estimation, inference and prediction (both space and time) grow at most linearly with sample size. Both traditional maximum likelihood estimation and numerous Markov cha...
متن کامل"Influence sketching": Finding influential samples in large-scale regressions
There is an especially strong need in modern largescale data analysis to prioritize samples for manual inspection. For example, the inspection could target important mislabeled samples or key vulnerabilities exploitable by an adversarial attack. In order to solve the “needle in the haystack" problem of which samples to inspect, we develop a new scalable version of Cook’s distance, a classical s...
متن کاملRandom Access Support for Large Scale VoD
The implementation of an interactive Video on Demand service is conventionally expensive because each viewer must be allocated a video stream. On the other hand, video streams can be multicast to a number viewers, reducing the system resources. This paper proposes a scheme, called Random Access PMC (RAPMC) which reduces the requirements for supporting an interactive VOD service. It is shown tha...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Annals of Statistics
سال: 2022
ISSN: ['0090-5364', '2168-8966']
DOI: https://doi.org/10.1214/21-aos2121